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Abstract: Due to the diverse medicinal effects, polyphenols are among the most 
intensively studied natural products. However, it is a great challenge to elucidate the 
polypharmacological mechanisms of polyphenols. To address this challenge, we establish a 
method for identifying multiple targets of chemical agents through analyzing the module 
profiles of gene expression upon chemical treatments. By using FABIA algorithm, we have 
performed a biclustering analysis of gene expression profiles derived from Connectivity 
Map (cMap), and clustered the profiles into 49 gene modules. This allowed us to defme 
a 49 dimensional binary vector to characterize the gene module profiles, by which we 
can compare the expression profiles for each pair of chemical agents with Tanimoto 
coefficient. For the agent pairs with similar gene expression profiles, we can predict the 
target of one agent from the other. Drug target enrichment analysis indicated that this 
method is efficient to predict the multiple targets of chemical agents. By using this method, 
we identify 148 targets for 20 polyphenols derived from cMap. A large part of the targets 
are validated by experimental observations. The results show that the medicinal effects of 
polyphenols are far beyond their well-known antioxidant activities. This method is also 
applicable to dissect the polypharmacology of other natural products. 
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1. Introduction 

Since reactive oxygen species (ROS), e.g., superoxide radical, hydrogen peroxide, and hydroxy! 
radical, are involved in the pathogenesis of many diseases, such as cancer, neurodegenerative diseases 
and atherosclerosis [1], antioxidants in particular polyphenolic antioxidants, have been widely 
expected to exert prophylactic or therapeutic effects on these diseases [2-5]. However, a large number 
of researches indicated that the strong in vitro antioxidant activities of polyphenols can not be 
translated into in vivo therapeutic effects [5-9]. This antioxidant paradox was primarily explained by 
the poor bioavailability of exogenous polyphenols [10]. Our analysis about the biological roles of 
polyphenols revealed that they were evolved for filtering UV light rather than scavenging intense ROS, 
which provided an evolutionary explanation to the weak in vivo radical-scavenging potential of 
polyphenols [11]. The evolutionary consideration also suggested that natural polyphenols have evolved 
an excellent scaffold with well-balanced rigidity and flexibility to adapt to different structures of 
enzymes in the biosynthetic pipeline, which enables the compounds to bind various proteins [12]. This 
finding implies that natural polyphenols have inherent potential to exert polypharmacological effects 
other than redox modulation [13]. However, how to elucidate the polypharmacological mechanisms of 
natural polyphenols is a great challenge, because the conventional methods to dissect drug mode of 
action (Mo A) are laborious and low throughput [14]. 

Recently, gene expression-based analysis showed great potential in identifying drug targets [15-17]. 
But the existent methods for gene expression profile analysis normally use limited signature genes 
(usually corresponding to ~500 probes out of 22,000+), which lose valuable information. In addition, 
these methods are efficient to reveal a single MoA or target for a certain drug, rather than its 
polypharmacological mechanisms [16]. Since gene expression signatures related to different biological 
activities cluster into different modules [18], we speculate that the polypharmacological mechanisms 
of polyphenols may be better dissected in terms of module profiles of gene expression. 

In a previous analysis about connectivity map (cMap), which contains 7056 expression profiles of 
5 different human cell lines treated with 1309 agents (including 20 polyphenols), we generated 49 gene 
modules by using biclustering approach FABIA (factor analysis for bicluster acquisition) [19]. 
Through analyzing the biological functions of the modules, we revealed that some polyphenols exert 
polypharmacological effects through activating transcription factors, such as estrogen receptors, 
nuclear factor (erythroid-derived 2)-like 2, and peroxisome proliferator-activated receptor gamma. 
In this study, we first establish a gene module-based target identification method and then use this 
method to further elucidate the polypharmacological mechanisms for the 20 polyphenols. 

2. Results and Discussion 

In a prior research, the cMap-derived 1309 agents and expression profiles have been grouped into 
49 gene modules by FABIA algorithm [19], which consist of 5921 probes, much greater than those 
used in the conventional microarray analysis [15,16]. Thus, each chemical agent in cMap has a gene 
module profile, which is defined by a 49 dimensional binary vector, with 1 or 0 representing the 
presence or not of the module (Table SI). This allows us to calculate Tanimoto coefficient for each 
pair of the compounds to characterize the similarity of their gene expression profiles. The bigger the 
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Tanimoto coefficient is, the more similar biological effects of the compound pairs are expected. For 
the compound pairs with similar gene module profiles, if one has the MoA and/or target information, 
we can predict the medicinal behaviors of the other. A total of 856,086 pairwise Tanimoto coefficients 
were calculated for the 1309 compounds in the cMap dataset (Table S2). The top 1% and 5% 
coefficients are higher than 0.45 and 0.33, respectively (Figure 1). 

Figure 1. Cumulative frequency (F(x)) of pairwise Tanimoto coefficients for 1309 agents. 
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To evaluate the effectiveness of this parameter in target identification, we performed a target 
enrichment test. First, by searching DrugBank [20] and Therapeutic Target Database (TTD) [21], we 
retrieved 573 approved drugs from 1309 agents, which hit 536 targets. Then, we found that 209 targets 
were shared by at least two drugs. These targets and corresponding 476 drugs can be used to assess the 
target enrichment significance. Although the drug targets collected by DrugBank and TTD may be 
incomplete and may be indirect targets, these information have been successfully used by previous 
studies to evaluate the target enrichment efficiency [16]. 113,050 pairwise Tanimoto coefficients were 
calculated for the 476 drugs. The drug pairs with Tanimoto coefficients of higher than 0.33 were used 
to estimate the probability of target sharing by hypergeometric test. The results showed that 78 targets 
of 128 drugs can be enriched (q < 0.05) (Table S3). It is noteworthy that 96 of 128 drugs have multiple 
targets (>2), for which the average ratio of target enrichment reaches 68.75% (66/96) (Table S3). In 
particular, the 7 targets of chlorpromazine, 8 targets of maprotiline, and 14 targets of imipramine were 
completely enriched (Table S3). Thus, the present method has great potential to predict MoA and targets of 
chemical agents, especially to dissect the polypharmacological mechanisms of natural products. 

The cMap-derived 1309 agents involve four kinds of polyphenols, i.e., fiavonoids (16 agents), 
monolignols (2 agents) and stilbenoids (1 agent), phenylpropanoids (1 agent). The gene module profiles 
of these polyphenols show that they are involved in more gene modules than other agents (14.85 ± 4.80 
vs. 11.85 ± 5.42, p < 0.01, ^-test), suggesting that polyphenols indeed have more complex biological 
functions than others. The most common modules covered by the 20 polyphenols include module 1 1 (with 
occurrence of 14), module 18 (with occurrence of 13), module 25 (with occurrence of 13), module 7 
(with occurrence of 12), and module 3 (with occurrence of 12). According to the previously enriched 
biological flinctions of 49 gene modules [19], the major functions associated with these modules are 
protein transport, protein location, cytoskeleton organization, cell motion, purine and pyrimidine 
metabolism, oxidative phosphorylation, cell cycle, RNA processing, ubiquitin-dependent protein catabolic 
process and translational elongation. By searching in GeneDecks [22], it was found that four of the 
five common modules (modules 3, 11, 18 and 25) are tightly linked to cancer and tumors (p < 0.0001). 




Top 5% Tanimoto Coefficient 
— Top 1% Tanimoto Coefficient 
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There are 93 drugs that are similar to the 20 polyphenols in terms of gene expression module profile 
(with Tanimoto coefficients > 0.45), which correspond to 148 targets and provide meaningful clues to 
clarifying the polypharmacology for these polyphenols (Table S4). In the predicted medicinal effects, 
anti-neoplastic is most popular (with occurrence of 17 in 93 drugs), in good agreement with the above 
finding that cancer is linked to most common gene modules. 

Tables 1-4 list the predicted targets of four most intensively studied polyphenols, including 
genistein (a representative component of soybean), quercetin (one of most widely distributed 
flavonoids), resveratrol (a representative component of red wine), and (-)-catechin (a representative 
component of green tea). It can be seen that antineoplastic and antihypertensive are the most common 
predicted activities of the four polyphenols, which agree well with the health benefits of their dietary 
sources. For instance, accumulating evidence indicated that high soybean intake and regular green tea 
drinking are associated with low incidence rates of human cancers and hypertension [23-28]. In 
addition, a large part (50%) of the predicted targets of these pol}^henols are validated by experiments, 
most (92.3%) of which are direct targets (Tables 1^). These results strongly warrant the experimental 
evaluation of other predicted targets. 

It is intriguing to note that phosphodiesterase enzymes (PDEs) and estrogen receptor are predicted 
targets for three of four polyphenols. This finding agrees well with the opinion that plant polyphenols 
collectively behave as phytoestrogens and can inhibit several isoforms of PDEs [29-31]. A major progress 
in recent natural medicine research was the identification of PDEs as the target of resveratrol [32]. The 
present analysis highlights the similar pharmacological mechanisms underlying genistein and quercetin. 



Table 1. Predicted similar drugs and associated targets of genistein. 



Drugs 


Therapeutic Uses 


Targets 


References 






Platelet-derived growth factor receptor 


[33] 


Imatinib 


Antineoplastic Agents 


Proto-oncogene tyrosine -protein kinase ABLl ^ 
Mast/stem cell growth factor receptor ^ 


[34] 
[35] 


Raloxifene 


AntUiypocalcemic Agents 


Estrogen receptor ^ 
Prostaglandin E2 receptor, EP2 subtype 


[36] 
[37] 


Iloprost 


Antihypertensive Agents 
Anti-Ulcer Agents 


cAMP-specific 3',5'-cyclic phosphodiesterase ^ 
Prostacyclin receptor " 


[38] 
[37] 


Cisapride 


Gastrointestinal Agents 

Prokinetic Agents 


5-Hydroxytrj^tanune 4 receptor 




Fluticasone 


Anti-inflammatory Agents 


Glucocorticoid receptor ^ 


[39] 


Diethylstilbestrol 


Antineoplastic Agents 


Estrogen receptor 


[36] 


Finasteride 


Anti-baldness Agents 
Antihyperplasia Agents 


Steroid-5-alpha reductase ^ 


[40] 


Sulindac sulfide 


Rheumatoid arthritis 






Prednisone 


Anti-inflammatory Agents 
Antineoplastic Agents 


Glucocorticoid receptor ^ 


[39] 


Estradiol 


Anti-menopausal Agents 
Anticholesteremic Agents 


Estrogen receptor ^ 


[36] 


Dydrogesterone 


Progesterones 


Progesterone receptor 





as direct targets of genistein; as indirect target of genistein which increases prostaglandin release; 
" as indirect target of genistein which increases prostacyclin release. 
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Table 2, Predicted similar drugs and associated targets of quercetin. 



Drugs 


Therapeutic Uses 


Targets 


References 




Adrenergic alpha-Antagonists 






Tolazoline 


Antihypertensive Agents 
Vasodilator Agents 


Alpha adrenergic receptor 




Tamoxifen 


Antineoplastic Agents 
Bone Density Conservation Agents 
Anti-baldness Agents 


Estrogen receptor ^ 


[41] 


Finasteride 


Antihyperplasia Agents 
Skin and Mucous Membrane Agents 


Steroid-5-alpha reductase 


- 


Sulindac sulfide 


Rheumatoid arthritis 


Prostaglandin E2 receptor, EP2 subtype 




lloprost 


Antihypertensive Agents 


cAMP-specitic 3 ,5 -cyclic phosphodiesterase 
Prostacyclin receptor 


[42] 


Raloxifene 


Antihypocalcemic Agents 
Bone Density Conservation Agents 


Estrogen receptor ^ 
Dopamine receptor ^ 


[41] 
[43] 


Apomorphine 


Antiparkinson Agents 


Adrenergic receptors 
5-Hydroxytryptamine receptor ^ 


[43] 


Fluticasone 


Anti-inflammatory Agents 


Glucocorticoid receptor 




Tocainide 


Anti- Arrhythmia Agents 


Sodium channel protein type 5 subunit alpha " 


[44] 




^ as direct targets of quercetin. 






Table 3. Predicted similar drugs and associated targets of resveratrol. 




Drugs 


Therapeutic Uses 


Targets 


References 


Reserpine 


Antihypertensive Agents 
Antipsychotic Agents 


Synaptic vesicular amine transporter 




Mercaptopurine 


Antineoplastic Agents 
Immunosuppressive Agents 


Hypoxanthine-guanine 
phosphoribosyltransferase 




XT* 1 * J 

Niclosamide 


Antiparasitic Agents 






Daunorubicin 


Antineoplastic Agents 
Anti-Allergic Agents 


DNA topoisomerase 
Histamine HI receptor 


- 


Terfenadine 


Antiarrhythmic Agents 


Potassium voltage-gated channel subfamily H 
member 2 ^ 
Muscarinic acetylcholine receptor M3 


[45] 


Fluphenazine 


Antipsychotic Agents 


Dopamine receptor 




Dipyridamole 


Vasodilator Agents 


Adenosine deaminase 
cGMP-specific 3',5'-cychc phosphodiesterase ^ 


[46] 


Rescinnamine 


Antihypertensive Agents 


Angiotensin-converting enzyme ^ 


[47] 


Trifluoperazine 


Antipsychotic Agents 


Dopamine receptor 




Metixene 


Antiparkinson Agents 


Muscarinic acetylcholine receptor 





" as direct targets of resveratrol. 
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Table 4, Predicted similar drugs and associated targets of (-)-catechin. 



Drugs 


Therapeutic Uses 


Targets 


References 


Letrozole 


Antineoplastic Agents 


Cytochrome P450 19A1 ' 


[48] 


Triprolidine 


Anti-Allergic Agents 


Histamine HI receptor 




Pindolol 


Antihypertensive Agents 


Adrenergic receptor 


- 


Vasodilator Agents 


5-hydroxytryptamine receptor 




Norfloxacin 


Anti-Bacterial Agents 


DNA topoisomerase 2-alpha " 


[48] 


Prilocaine 


Anesthetics 


CJ" 1 1 , ■ , ^-1 'ill 

Sodium channel protem type 5 subunit alpha 




Estradiol 


Anti-menopausal Agents 
Anticholesteremic Agents 


Estrogen receptor " 


[49] 


Doxycycline 


Anti-Bacterial Agents 


30S ribosomal protein 
Solute carrier family 12 member 3 




Bendroflumethiazide 


Antihypertensive Agents 


Calcium-activated potassium channel subunit alpha 1 

Carbonic anhydrase 




Theophylline 


Bronchodilator Agents 
Vasodilator Agents 


Adenosine Al receptor 
cGMP-specific 3',5'-cyclic phosphodiesterase ^ 


[29] 


Naltrexone 


Ami-craving Agents 


Opioid receptor ' 


[50] 



as direct targets of (-)-catechin. 



3. Experimental 

3.1. Tanimoto Coefficient Calculation 

Tanimoto coefficient ( JC) was calculated with a perl program to compare the gene module profiles 
of each compound pair. 

^AB 

TC = en 

where Na and Nb are the number of bits set for gene module profiles of compounds A and B, 
respectively, and Nab is the set bits that A and B have in common. If TC = 1, the compound pair have 
the same module profiles; if TC = 0, the pair have totally different module profiles. 

3.2. Drug Target Enrichment 

Hypergeometric test was used to assess the drug target enrichment significance. The Equation (2) 
was derived by computing the extreme tail probabilities: 



(2) 



where N is the number of total approved drugs for target enrichment (i.e., 476), M is the number of 
drugs involving the similar module profiles (with Tanimoto coefficient > 0.33), i is the number of 
drugs sharing the same target in N, K is the number of drugs sharing the same target in IVI. Thus, we 
can calculate the probability by chance, at least x occurrences of a target among those associated with 
the JVI drugs. The /^-values were further adjusted by False Discovery Rate calculation (with R function 
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'p.adjustQ' using Benjamini-Hochberg method [51]). The enriched targets were ranked by />-value 
from most significant to least significant. Then, for each target the g-value is calculated by Equation (3): 



where Count is the total number of enriched targets. The enriched targets were then selected using a 
g-value threshold of 0.05. 

4. Conclusions 

Natural products (NPs) have made important contributions to safe guarding human health. Not only 
ancient humans depended on NPs to cure various diseases, modern pharmaceutical industry also 
benefit from NPs to find hits, leads and drugs [12]. Therefore, it is of great significance to elucidate the 
therapeutic mechanisms of NPs. However, this is a big challenge, because NPs usually hit multiple 
targets with relatively weak affinity and the conventional target identification methods are laborious 
and low throughput [14]. 

In this study, we established a gene module-based target identification method. Because gene 
modules cover more gene probes, this method is more efficient than conventional microarray analysis 
methods in information extraction. Therefore, this method enables the discovery of richer information 
about the medicinal effects of chemical agents, which is very helpfiil to clarify the polypharmacological 
mechanisms of polyphenols and other NPs. Moreover, this method may be used to predict targets for 
NPs beyond those contained in cMap, so it is expected to find more and more applications in the omics 
era, because the NP -related microarray data are rapidly accumulated. 
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